WIP: exploratory branch for fast array views #5003

StefanKarpinski · 2013-12-02T03:34:51Z

The code gen for the core indexing computations – both multidimensional and linear – in c980bae makes me hopeful that we can get array views to be a very fast core component of the language. This approach is really making me wonder if we shouldn't have a simple single-dimensional Vec or Buffer type that underlies all arrays, in terms of which they can be defined. We could even hide it from sight for the most part, but it makes defining arrays in Julia much more doable.

ViralBShah · 2013-12-02T04:30:48Z

We used to have a Buffer in the very early days, if you remember. :-)

StefanKarpinski · 2013-12-02T04:32:51Z

I do recall. Then we decided tuple, buffer and array were just too many vector-like types to explain to people. I'm thinking more something that you use to implement other things, not a "user-facing" data type, if you see what I mean.

ViralBShah · 2013-12-02T04:44:44Z

There was also the performance issue of having all the metadata associated with arrays in a composite type, IIRC. That may no longer be the case with immutable types.

StefanKarpinski · 2013-12-02T04:50:23Z

Yes, it might be more reasonable now. But in any case, it's not necessary for this. The thing that made me think of it is that this currently is written specifically to be a view of a vector, not a multidimensional array. I could make the type more general, but then I'd have to add a gratuitous type parameter for the parent array, which is annoying. I could also construct views of multidimensional arrays by reshaping them to 1-d and then taking a view of that, but that's a bit weird.

kmsquire · 2013-12-04T05:16:04Z

divmod support (#4437) might be nice here.

StefanKarpinski · 2013-12-04T05:40:03Z

LLVM does that automatically on x86 :-)

See the native instructions here: c980bae – only div, no mod.

kmsquire · 2013-12-04T07:16:08Z

Nice. I'll have to brush up on my assembly to really understand what's happening, but nice!

timholy · 2013-12-04T13:17:30Z

Surprisingly, this branch doesn't seem to fix the performance problem, even though I can verify that the assembly is shorter.

On this branch:

julia> code_native(gtindex, (typeof(B2), Int))
        .text
Filename: /tmp/gtindex.jl
Source line: 2
        push    RBP
        mov     RBP, RSP
        push    R14
        push    RBX
        mov     RBX, RSI
        mov     R14, RDI
Source line: 2
        movabs  RAX, 139991579184432
        mov     ESI, 1
        call    RAX
        mov     RCX, RAX
Source line: 147
        mov     RSI, QWORD PTR [R14 + 8]
        test    RSI, RSI
        je      69
        mov     RDI, QWORD PTR [R14 + 32]
        test    RDI, RDI
        je      56
Source line: 3
        dec     RBX
Source line: 4
        mov     RAX, RBX
        cqo
        idiv    RCX
Source line: 147
        mov     RDI, QWORD PTR [RDI + 8]
        mov     RDX, QWORD PTR [RDI + 8]
        imul    RDX, RAX
Source line: 5
        imul    RAX, RCX
        sub     RBX, RAX
Source line: 147
        imul    RBX, QWORD PTR [RDI]
        add     RBX, QWORD PTR [R14 + 40]
        add     RBX, RDX
        mov     RAX, QWORD PTR [RSI + 8]
        movsd   XMM0, QWORD PTR [RAX + 8*RBX - 8]
Source line: 7
        pop     RBX
        pop     R14
        pop     RBP
        ret
Source line: 147
        movabs  RAX, 139991621867280
        mov     RDI, QWORD PTR [RAX]
        movabs  RAX, 139991607851584
        mov     ESI, 147
        call    RAX

On master:

julia> code_native(gtindex, (typeof(B2), Int))
        .text
Filename: /tmp/gtindex.jl
Source line: 2
        push    RBP
        mov     RBP, RSP
        push    R14
        push    RBX
        mov     RBX, RSI
        mov     R14, RDI
Source line: 2
        movabs  RAX, 139991602731920
        mov     ESI, 1
        call    RAX
        mov     RCX, RAX
        cmp     RCX, -1
Source line: 4
        setne   AL
Source line: 3
        dec     RBX
        movabs  RDX, -9223372036854775808
        cmp     RBX, RDX
Source line: 4
        setne   DL
Source line: 2
        test    RCX, RCX
        je      87
Source line: 4
        or      AL, DL
        je      79
Source line: 147
        mov     RSI, QWORD PTR [R14 + 8]
        test    RSI, RSI
        je      96
        mov     RDI, QWORD PTR [R14 + 32]
        test    RDI, RDI
        je      83
Source line: 4
        mov     RAX, RBX
        cqo
        idiv    RCX
Source line: 147
        mov     RDI, QWORD PTR [RDI + 8]
        mov     RDX, QWORD PTR [RDI + 8]
        imul    RDX, RAX
Source line: 5
        imul    RAX, RCX
        sub     RBX, RAX
Source line: 147
        imul    RBX, QWORD PTR [RDI]
        add     RBX, QWORD PTR [R14 + 40]
        add     RBX, RDX
        mov     RAX, QWORD PTR [RSI + 8]
        movsd   XMM0, QWORD PTR [RAX + 8*RBX - 8]
Source line: 7
        pop     RBX
        pop     R14
        pop     RBP
        ret
Source line: 4
        movabs  RAX, 139991645423408
        mov     RDI, QWORD PTR [RAX]
        movabs  RAX, 139991631408592
        mov     ESI, 4
        call    RAX
Source line: 147
        movabs  RAX, 139991645423376
        mov     RDI, QWORD PTR [RAX]
        movabs  RAX, 139991631408592
        mov     ESI, 147
        call    RAX

I've updated that gist to make sure that any differences in inlining (which seems to be prevented by assigning the output to a variable) are compensated. The assembly by gcc is here.

timholy · 2013-12-04T13:17:57Z

Oh, and the profile suggests that the div statement is still the main issue.

StefanKarpinski · 2013-12-04T13:39:42Z

Hmm. Maybe I'll have to ponder hard how we might be able to get rid of the div. it's a really hard problem though. I'm not sure it's even possible.

timholy · 2013-12-04T15:21:16Z

Here's a perhaps-interesting observation: with the @inbounds ret = ... (preventing inlining), the Julia timings are similar (not identical) to the non-optimized gcc timings (see the code in the gist). In particular, without optimization the C implementation of the subarray version is much slower than the C array version (by about the same margin as for Julia), whereas on -O2 they are nearly indistinguishable. Might there be some sophisticated trickery here, like hoisting some of the div-type operations out of the getindex call and into the containing loop? (I can't read assembly well enough to decipher this myself.)

simonster · 2013-12-04T18:10:43Z

@timholy I suspect that was with strd2 = size(s,1) and not @inbounds strd2 = s.dims[1]? With that change I get assembly that more closely resembles the C version, with two major differences: 1) the Julia code does checks to determine whether the fields of the SubArray type are undefined and 2) in the C version, you are passing the size and strides as arguments so they are being passed in registers, while in the Julia version they are being loaded from memory.

timholy · 2013-12-04T19:51:52Z

You're right that the gist did not include that optimization, but I noticed that too and tested it locally. As we noted before, it definitely makes a difference, but only about 10%. Performancewise, there's still a 3-4x gap between non-inlined performance with 2d arrays vs. subarrays in Julia.

The two differences you noted (nice detective work, thanks!) seem like potential explanations of the performance gap. One test might be to compare to cartesian indexing A[i,j], which has the same issues but avoids the div. Here I used getindex, which allows inlining in all cases except linear indexing with SubArrays. Here's what I get:

Array 2d: elapsed time: 0.186270968 seconds (64 bytes allocated)
SubArray 2d: elapsed time: 1.991903323 seconds (1664 bytes allocated)
Cartesian, Array 2d: elapsed time: 0.189886095 seconds (64 bytes allocated)
Cartesian, SubArray 2d: elapsed time: 0.431062165 seconds (64 bytes allocated)

So there's a 2.25-fold gap for cartesian indexing. Cartesian indexing has the same issues you point out for linear indexing, but the gap is smaller.

The tenfold-gap for linear indexing, by comparison to Arrays, will hopefully be mostly fixed by @vtjnash's inlining work?

One last thing to note is the puzzling allocation of memory for linear-indexing SubArrays.

toivoh · 2013-12-04T20:13:54Z

The div is for linear indexing into array views, right?
One of my motivations to work on broadcasting was to help the case for array views. I think that most things that we currently implement with linear indexing would be more properly and powerfully done with broadcasting, and hopefully more efficiently as well. The major trouble I think is that it's a lot harder to find and implement good abstractions for broadcasting to use in everyday code as compared to linear indexing.

One simple observation though: I get the feeling that most uses of linear indexing involve simple iteration over the array (view) elements. It should be a lot simpler to do efficient linear iteration as compared to linear random access indexing.

JeffBezanson · 2013-12-04T20:16:02Z

I have felt for a while that we need a different idiom for iterating over arrays than linear indexing. Linear indexing only makes sense for that application for simple dense arrays.

timholy · 2013-12-04T21:11:20Z

I agree too; of course, a natural alternative is Cartesian indexing. Basically, Cartesian.jl was born in part to handle this issue with the performance of SubArray indexing. (It's more than that, but it was a major motivation.)

Hence #1917 will almost certainly quickly become important in any discussion of alternatives to linear indexing.

davidssmith · 2013-12-04T21:33:00Z

+1 I'm very excited about this. This is half of what I do all day.

- only supports Vector{T} as the data array - multidimensional indexing via an arbitrary linear transform - linear indexing compatible with the multidimensional indexing

I'm not even sure why we were generating code to check for a zero denominator – this seems to still raise an exception as expected. The other special case just isn't worth it to emit more than one instruction for div – and div/rem since x86 does both together.

This version lets you see how tight the code gen can be without the gratuitous (now-removed) checks from the div intrinsic and the undef checks from loading the fields from the ArrayView structure: julia> code_native(view2data,(Int,(Int,Int),(Int,Int))) .section __TEXT,__text,regular,pure_instructions Filename: /Users/stefan/projects/julia/base/array/view.jl Source line: 17 push RBP mov RBP, RSP Source line: 17 mov R8, QWORD PTR [RSI + 16] mov RSI, QWORD PTR [RSI + 24] mov RAX, QWORD PTR [RDX + 16] mov RCX, QWORD PTR [RDX + 24] mov RCX, QWORD PTR [RCX + 8] dec RCX imul RCX, QWORD PTR [RSI + 8] mov RAX, QWORD PTR [RAX + 8] dec RAX imul RAX, QWORD PTR [R8 + 8] add RAX, RDI Source line: 19 lea RAX, QWORD PTR [RCX + RAX + 1] pop RBP ret julia> code_native(view2data,(Int,(Int,Int),(Int,Int),Int)) .section __TEXT,__text,regular,pure_instructions Filename: /Users/stefan/projects/julia/base/array/view.jl Source line: 23 push RBP mov RBP, RSP mov R8, RDX Source line: 23 lea RAX, QWORD PTR [RCX - 1] Source line: 25 mov RCX, QWORD PTR [RSI + 16] mov R9, QWORD PTR [RSI + 24] cqo idiv QWORD PTR [RCX + 8] mov RCX, RDX mov RDX, QWORD PTR [R8 + 16] mov RSI, QWORD PTR [R8 + 24] imul RCX, QWORD PTR [RDX + 8] add RCX, RDI cqo idiv QWORD PTR [R9 + 8] imul RDX, QWORD PTR [RSI + 8] Source line: 28 lea RAX, QWORD PTR [RDX + RCX + 1] pop RBP ret

Also support views into multidimensional arrays. This wasn't all that annoying, actually; I realized I could just add an `m` parameter.

julia> include(joinpath(JULIA_HOME,"../../base/array/view.jl")) benchmark (generic function with 4 methods) julia> results = benchmark(); julia> data = squeeze(mapslices(median,results[:,:,2:3]./results[:,:,1],2),2) 10x2 Array{Float64,2}: 6.64002 2.99833 6.69273 16.3886 8.59361 29.9632 10.4218 48.2432 12.1854 65.8061 13.2618 2357.6 16.1612 2534.79 17.3616 2367.67 17.1718 2754.53 20.8452 3254.62 julia> data./[1:size(data,1)] 10x2 Array{Float64,2}: 6.64002 2.99833 3.34636 8.19429 2.86454 9.98773 2.60545 12.0608 2.43708 13.1612 2.21029 392.934 2.30875 362.113 2.1702 295.958 1.90797 306.059 2.08452 325.462

unfortunately, this takes a damned long time to run because the multidimensional linear SubArray indexing is *soooo* slow.

ViralBShah · 2013-12-29T06:29:56Z

@lindahua Just pinging you on this thread, in case you haven't seen it already.

StefanKarpinski · 2013-12-29T07:01:36Z

Yes, that's about right. The improvement is good but I'm still a bit sad that I can't get it even closer to direct linear indexing. It's pretty damned hard to compete with just lea + fetch.

ViralBShah · 2013-12-29T07:25:04Z

I do feel that we need to get array views as fast as array indexing. Perhaps we need to hack some support deep in the internals, but it would be worth it.

StefanKarpinski · 2013-12-29T07:44:18Z

I'm not entirely sure it can be done, regardless of how internal you get. There's a certain amount of arithmetic that has to be done to turn a linear index into an offset into the original array and there's a fair amount of non-linearity to it, so you can't just collapse the computations down all the way. But we may be able to get it within 2x if we're very careful.

ViralBShah · 2013-12-29T08:35:20Z

Can't we get the offset with pointer_to_array, and then do the same arithmetic that is currently done for Array indexing? That was the approach @lindahua was trying here:

https://github.com/JuliaLang/julia/pull/3503/files

StefanKarpinski · 2013-12-29T08:37:19Z

You can do that, but you still have to do more arithmetic. Like I said, the relationship between the input index and the offset into the original array is not a simple linear relationship.

StefanKarpinski · 2013-12-29T08:37:44Z

In particular, the amount of work you need to do is proportional to the number of dimensions.

StefanKarpinski · 2013-12-29T08:53:44Z

Ok, those undef checks are really killing us. I just manually hoisted the lookups for those out of the benchmarking loop and got this:

julia> squeeze(mapslices(median,10^9*results./sizes,2),2)
10x2 Array{Float64,2}:
 1.93126     6.10509
 1.47899     6.98912
 1.40065    13.3511
 1.50102    56.747
 1.49355    78.5422
 1.52309    93.0085
 0.421136   33.0249
 0.247773   21.4358
 1.76813   131.094
 1.5015    175.582

julia> squeeze(mapslices(median,results[:,:,2]./results[:,:,1],2),2)
10-element Array{Float64,1}:
   3.15233
   4.73653
   9.32549
  38.1872
  52.2354
  61.1526
  78.704
  84.6919
  75.6112
 117.906

I didn't run the SubArray benchmarks since they're slow and won't have changed. This is pretty respectable performance. I think that with some more performance tweaking and appropriate optimizations, we can get this really close to direct linear indexing, but unless someone comes up with a brilliant trick, there's always going to be a cost to linear indexing into high-dimensional array views.

timholy · 2013-12-29T12:06:38Z

@StefanKarpinski, this is amazing work.

With regards to what is ultimately possible, have you considered making more extensive comparisons against C? In particular, an analysis that I had done earlier seemed to suggest that for C, the arithmetic (at least for 2d) was inconsequential compared with cache misses. Are you running these tests with arrays too big to fit into L1 cache?

Of course, it's also possible that there's something devious going on, and I misinterpret the results of the C code.

Finally, there's another major issue: whether getindex itself is inlined within the caller. Inlining makes a huge difference in tight loops, and I suspect it's currently impossible to write SubArray linear indexing in a way that allows it to be inlined. In my own investigations I had come to the conclusion that perhaps we should wait to re-evaluate the performance of array-view indexing until after #3796 gets merged.

lindahua · 2013-12-29T12:18:18Z

@StefanKarpinski This is great work. Thanks for taking the lead here.

My two cents:

I have tried to tackle the problem of efficient indexing several times, but it ended up much more complex than I had thought. I still, however, believe that this problem is doable given enough time and efforts.
I agree that array views (instead of copies) are the way to go.
I think there should be two kind of views: contiguous views and non-contiguous strided views. The reason is that linear indexing of these two kinds of views are vastly different. For contiguous views, linear indexing is obviously trivial. But for non-contiguous views, linear indexing may involve integer division, which is considerably slower and more complex.

The internal of these two kinds of views are also different. Contiguous views only have to maintain the memory address and the shape, while non-contiguous views will also have to maintain the strides.

It is easy to determine when to return contiguous views in a static way (with type stability). Suppose a is a contiguous array/view, then anything looks like a[:], a[:,:], a[:,i], a[:,:,i],etc where i is an integer or a contiguous range will also be a contiguous view. Otherwise, it is in general not contiguous, and a strided view should be returned.
Specialization is important to get efficiency for common cases. It takes some efforts, but it is worth it. I think specialize things up to 2D/3D should cover at least 90% of everyday use of the language. Of course, a lot of test cases are needed to ensure that things are implemented correctly (including corner cases).
In my previous try, I found that index checking usually kills performance if implemented in a typical Julian way. So some internal support might be needed to address this issue.

StefanKarpinski · 2013-12-29T18:26:36Z

@timholy: trying out these techniques in C is a good idea since it will give a good sense of what the best performance we can possibly get would be. I worry that the C compiler might do things that wouldn't be valid for Julia's compiliation to do because it can perform static global analysis that we can't.

@lindahua: I think you are absolutely right about contiguous views and it does seem like there are a lot of situations where we can statically determine that a view should be contiguous. I'll have to think about that a bit more – definitely a huge performance win.

JeffBezanson · 2013-12-29T19:36:00Z

Although it might add additional challenges, I'd also like to explore
moving away from linear indexing. It clearly isn't general enough. Tuples
of integers can be fast now, so we have more options.
On Dec 29, 2013 1:26 PM, "Stefan Karpinski" notifications@github.com
wrote:

@timholy https://github.com/timholy: trying out these techniques in C
is a good idea since it will give a good sense of what the best performance
we can possibly get would be. I worry that the C compiler might do things
that wouldn't be valid for Julia's compiliation to do because it can
perform static global analysis that we can't.

@lindahua https://github.com/lindahua: I think you are absolutely right
about contiguous views and it does seem like there are a lot of situations
where we can statically determine that a view should be contiguous. I'll
have to think about that a bit more – definitely a huge performance win.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/5003#issuecomment-31322185
.

StefanKarpinski · 2013-12-29T20:42:43Z

I definitely agree that leaning so hard on linear indexing is just causing a problem that doesn't need to exist. Now that we have a proposed hierarchy of array-like things, we should try to figure out a better general protocol for iterating over them that will be efficient for a wide range of implementations. I don't think that faster tuples help so much if you mean using tuples as state since it's not really the multiple pieces of state that we need but the multiple layers of control – we really need nested for loops, each of which has its own piece of state that it can update very simply. More pieces of state with a single control loop isn't any better because the update step still ends up being very complex – and it still happens on each iteration.

JeffBezanson · 2013-12-29T20:52:03Z

That's a good point, but I was hoping that using a local tuple would allow LLVM's scalar conversion and hoisting to figure out which pieces of the indexing calculations are constant for each loop. We could also end up with each loop variable being an integer, but the programmer seeing a single tuple as each index.

We need a for i = 1:length(n) for the twenty-first century. It would be great if a simple loop like this could work roughly equally well on local and distributed arrays. In no way do I expect a silver bullet, but there could at least be something better than the fallback implementation individually fetching every element from a remote processor and taking 1000x longer.

timholy · 2013-12-30T11:45:06Z

I agree that more flexible iteration is highly desirable.

@StefanKarpinski, on linear indexing and C: depending on how much work it is for you, one way to find out would be to look at the assembly generated by the code in that gist I linked to. That's 2d only, but it would be a start.

dmbates · 2014-01-10T16:33:55Z

Could someone clarify for me the state of the art regarding ArrayView, NumericExtensions.UnsafeVectorView, etc.? I am interested in present facilities and the near term future.

The context is that I am writing a new edition of Bates and Watts, 1988, "Nonlinear Regression Analysis and Its Applications". This will be a large rewrite under the working title of "Nonlinear Regression with R" but I now propose changing that to "Nonlinear Regression with R and Julia".

For generality, the model function evaluation for partially linear models (almost all nonlinear regression models have one or more parameters that occur linearly in the model) has a signature of

### Michaelis-Menten model for enzyme kinetics
function MicMenmmf(nlpars::StridedVector,x::StridedVector,
                   tg::StridedVector,MMD::StridedMatrix)
    x1 = x[1]
    denom = nlpars[1] + x1
    MMD[1,1] = -(tg[1] =  x1/denom)/denom
end

because I am using sub on vectors and matrices when calling such a model function.

On thinking about it, I am probably better off using duck typing as advocated by Steven Johnson and not using a signature.

I still need to decide how to call such a function. All the arrays involved in the evaluation are configured so that the array views passed to the function will be contiguous views. Right now I am leaning toward using the unsafe views from the NumericExtensions package for these, because these are contiguous. Should I expect to continue to use those or will ArrayView's be preferred?

vtjnash · 2014-01-11T01:50:08Z

@dmbates off-topic, but I would avoid using the NumericExtensions.Unsafe* functions. I've only read through the code quickly, but it makes a number of assumptions which would make segfaults and incorrect answers likely. If performance is really that critical, IMHO, you are probably better off writing your code in C, where at least the trade-off is obvious.

A much better solution is to combine limited and judicious usage of @inbounds with non-copying array indexing (e.g. sub, which will eventually become the default) as you are doing now

lindahua · 2014-01-11T03:17:24Z

Ok, let me make this more explicit. unsafe_views do make several assumptions: (1) through the life time of the view, the underlying array should not be resized (therefore the cached pointer remains valid). (2) no bound checking: it assumes input indices are within bounds. I think the word unsafe in the names should have given the user proper warnings about these assumptions.

These stuff should be used with caution. But I don't think they are as risky as @vtjnash suggests. If one uses these views within a local scope (i.e. not passing them around) and carefully ensures that the indices are actually in bounds. There won't be a lot of risk.

I introduced unsafe_views basically as a stopgap. For example, if you want to efficiently process columns one by one,
you currently have two choices in Julia base:

use a[:, i] -- it makes an unnecessary copy
use sub(a, 1:m, i) -- this does not make the copy. But working with a sub-array is many times (sometimes more than one order of magnitude) slower than with an ordinary array.

unsafe_views addresses both issues -- but you have to be careful when using them.

When the array views in Julia base become fast enough (which I am eagerly looking forward to), one can just do a search & replace to replace unsafe_view with sub or view, and the codes are likely to continue to work.

vtjnash · 2014-01-11T03:47:53Z

They aren't properly gc-rooted, so it is unsafe to assume that the original array will stick around, and you have to assume that any code that could get called on the array anywhere (esp. code that is expecting out-of-bounds conditions to throw an error) does not have typos. As I said before, if you want your code to require manual memory management and tracking of pointers, then just be honest about it and use C.

I generally have a strong dislike of creating stopgaps, I would rather see the effort put into writing correct code and enjoying the speed increases as the underlying issues get addressed.

lindahua · 2014-01-11T05:34:00Z

I agree with your general opinions. However, I don't think using unsafe_view is always a bad idea.

Admittedly, unsafe views are not a terribly elegant solution. I myself don't use them if there are other solutions with comparable performance (within 2x - 3x) & productivity.

However, when the performance gap is up to 5x - 10x or even bigger, I don't see I still want to stick to the recommended ways, and have my program run 10 times slower. In the rare cases that I use unsafe_views in my codes, I strictly follow the pattern below:

function bar(vec, ...)
    ....
end

function foo( a::Matrix, ... )
    for i = 1 : size(a, 2)
        col = unsafe_view(a, :, i);
        bar(col,  ....)
    end
end

I never throw the unsafe views around, and the code context makes it clear that the indices are within bounds. I don't see how the use of unsafe views under such restricted conditions would do any harm.

I can certainly implement all these in C. But it means that I also have to migrate the related functions (such as bar in the example above) to C. Not to mention that there's much less support of numerical computation in C, implying that implementing these functions in C would take much longer time.

Generally, I suggest writing codes using safe ways in the first version of the implementation. And use unsafe views to improve the performance of critical parts with caution (preferably under the guidance of profiler).

In terms of efficient array views, that's a complex problem. People have been working on it for months, still the progress is very limited. Therefore, such stopgaps are introduced to temporarily work around the problem.

Sometimes, we have to make tradeoffs from a practical standpoint.

StefanKarpinski mentioned this pull request Dec 2, 2013

Vector expression benchmarks #5008

Closed

StefanKarpinski added 12 commits December 28, 2013 13:18

base/array/view.jl: some very basic array view sketching.

ae0ae6e

- only supports Vector{T} as the data array - multidimensional indexing via an arbitrary linear transform - linear indexing compatible with the multidimensional indexing

base/array/view.jl: manually inline and simplify linear view2data.

c5d517a

base/array/view.jl: tweaking for beauty.

0a42dd3

base/array/view.jl: view construction with ranges a la slicing.

a335ce1

Also support views into multidimensional arrays. This wasn't all that annoying, actually; I realized I could just add an `m` parameter.

base/array/view.jl: use 1-based origin instead of 1-based offset.

7806513

base/array/view.jl: simpler size init; drop gratuitous type params.

22490cb

base/array/view.jl: an view of a view is a view (dims must match).

cf1199d

base/array/view.jl: linear v2a using only div, not precomputed.

2f0e4d6

make benchmark array sizes more consistent + deterministic

091a4ca

unfortunately, this takes a damned long time to run because the multidimensional linear SubArray indexing is *soooo* slow.

tknopp mentioned this pull request Dec 29, 2013

Framework for symbolic optimizations #122

Closed

lindahua mentioned this pull request Jan 6, 2014

array type hierarchy #987

Closed

jiahao mentioned this pull request Jan 8, 2014

A_mul_B and all that jazz #5332

Closed

jiahao force-pushed the master branch 3 times, most recently from 6c7c7e3 to 1a4c02f Compare October 11, 2014 22:06

jiahao force-pushed the master branch from cdde4df to 7fdc860 Compare October 28, 2014 04:20

MikeInnes force-pushed the master branch from 5c60996 to b1c3df3 Compare November 14, 2014 17:07

StefanKarpinski closed this Nov 25, 2014

StefanKarpinski deleted the sk/arrayviews branch August 22, 2015 16:50

WIP: exploratory branch for fast array views #5003

WIP: exploratory branch for fast array views #5003

Conversation

StefanKarpinski commented Dec 2, 2013

ViralBShah commented Dec 2, 2013

StefanKarpinski commented Dec 2, 2013

ViralBShah commented Dec 2, 2013

StefanKarpinski commented Dec 2, 2013

kmsquire commented Dec 4, 2013

StefanKarpinski commented Dec 4, 2013

kmsquire commented Dec 4, 2013

timholy commented Dec 4, 2013

timholy commented Dec 4, 2013

StefanKarpinski commented Dec 4, 2013

timholy commented Dec 4, 2013

simonster commented Dec 4, 2013

timholy commented Dec 4, 2013

toivoh commented Dec 4, 2013

JeffBezanson commented Dec 4, 2013

timholy commented Dec 4, 2013

davidssmith commented Dec 4, 2013

ViralBShah commented Dec 29, 2013

StefanKarpinski commented Dec 29, 2013

ViralBShah commented Dec 29, 2013

StefanKarpinski commented Dec 29, 2013

ViralBShah commented Dec 29, 2013

StefanKarpinski commented Dec 29, 2013

StefanKarpinski commented Dec 29, 2013

StefanKarpinski commented Dec 29, 2013

timholy commented Dec 29, 2013

lindahua commented Dec 29, 2013

StefanKarpinski commented Dec 29, 2013

JeffBezanson commented Dec 29, 2013

StefanKarpinski commented Dec 29, 2013

JeffBezanson commented Dec 29, 2013

timholy commented Dec 30, 2013

dmbates commented Jan 10, 2014

vtjnash commented Jan 11, 2014

lindahua commented Jan 11, 2014

vtjnash commented Jan 11, 2014

lindahua commented Jan 11, 2014